# load_datasets_in_python ## preimport.py ```py3 import pandas as pd import sklearn import statsmodels import matplotlib.pyplot as plt %matplotlib inline ``` ## data sets - statsmodelsの関数からRのデータセットを用いる方法 - http://vincentarelbundock.github.io/Rdatasets/datasets.html - scikit learnで提供されているデータセット - 見やすいようにPandasで加工 - http://scikit-learn.org/stable/datasets/#toy-datasets ### dataset_statsmodels.py ```py3 import statsmodels.api as sm duncan_prestige = sm.datasets.get_rdataset("Duncan", "car") duncan_prestige.data.head() type income education prestige accountant prof 62 86 82 pilot prof 72 76 83 architect prof 75 92 90 author prof 55 90 76 chemist prof 64 86 90 ``` ### dataset_sklearn.py ```py3 # https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/base.py from sklearn import datasets iris = datasets.load_iris() iris_data = pd.DataFrame(iris.data, columns=iris.feature_names) iris_data["target"] = pd.Series(iris.target) iris_data["target"] = iris_data["target"].apply(lambda x: iris.target_names[x]) iris_data.head() sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa ```